Model Selection

Reinforcement Learning Tuning

# Reinforcement Learning Tuning

MiMo-7B-RL is a reinforcement learning model trained based on the MiMo-7B-SFT model, demonstrating outstanding performance in mathematical and code reasoning tasks, comparable to OpenAI o1-mini.

Large Language Model

Meta Llama 3 70B Fp8

Meta Llama 3 70B is a large language model developed by Meta, featuring 70 billion parameters and supporting an 8k context length, designed for English-language business and research applications.

Large Language Model

Transformers English

Meta Llama 3 8B Instruct GGUF

This is the GGUF quantized version of the 8 billion parameter instruction-tuned model from the Meta Llama 3 series, optimized for dialogue scenarios and demonstrating excellent performance in multiple benchmark tests.

Large Language Model English

Ppo LunarLander V2

This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase